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Abstract 

Background: Multifactor dimensionality reduction (MDR) is a powerful method for analysis of gene-gene 
interactions and has been successfully applied to many genetic studies of complex diseases. However, the main 
application of MDR has been limited to binary traits, while traits having ordinal features are commonly observed in 
many genetic studies (e.g., obesity classification - normal, pre-obese, mild obese and severe obese). 

Methods: We propose ordinal MDR (OMDR) to facilitate gene-gene interaction analysis for ordinal traits. As an 
alternative to balanced accuracy, the use of tau-b, a common ordinal association measure, was suggested to 
evaluate interactions. Also, we generalized cross-validation consistency (GCVC) to identify multiple best interactions. 
GCVC can be practically useful for analyzing complex traits, especially in large-scale genetic studies. 

Results and conclusions: In simulations, OMDR showed fairly good performance in terms of power, predictability 
and selection stability and outperformed MDR. For demonstration, we used a real data of body mass index (BMI) 
and scanned 1 -4-way interactions of obesity ordinal and binary traits of BMI via OMDR and MDR, respectively. In 
real data analysis, more interactions were identified for ordinal trait than binary traits. On average, the commonly 
identified interactions showed higher predictability for ordinal trait than binary traits. The proposed OMDR and 
GCVC were implemented in a C/C++ program, executables of which are freely available for Linux, Windows and 
MacOS upon request for non-commercial research institutions. 



Background 

Because most complex biological phenotypes are often 
affected by multiple genes and environmental factors, the 
investigation of gene-gene and gene-environment interac- 
tions can be essential in understanding the genetic archi- 
tecture of complex traits [1]. It has been pointed out that 
focusing only on marginal effects of individual genes may 
result in low power and a low replication rate in genetic 
association studies of complex traits [2,3]. 

Many different methods have been proposed to analyze 
gene-gene interactions in genetic association studies [4,5], 
and can be categorized to methods based on regression 
modeling [6-9], pattern recognition [10,11], and data 
reduction [12-14]. Recently, machine learning approaches, 
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such as random forest [15], support vector machine [16] 
and ensemble learning [17], were applied to gene-gene 
interaction analysis. 

While each method has its own advantages and disad- 
vantages, the multifactor dimensionality reduction (MDR) 
method, a data-reduction approach, is known to have the 
advantages in examining high-order interactions and 
detecting interactions without main effects [13,18-20], and 
has been widely applied to detect gene-gene interactions 
in many common diseases (see the related literature avail- 
able on http://epistasis.org). In addition, because the mode 
of genetic inheritance of a common complex trait is 
usually unknown a priori, MDR can be more useful to 
study a complex trait in that it does not require any 
assumption on genetic model. Since the MDR method 
was first introduced, it has been extended in many direc- 
tions. Examples include family data [21], covariate adjust- 
ment and quantitative traits [22], the quantitative measure 
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of multi-locus genotype risk [23], and the selection of a 
parsimonious genetic model [24]. However, the applicabil- 
ity of existing MDR approaches is still restricted mainly to 
binary traits. 

In the MDR analysis for binary traits, multi-locus geno- 
type combinations of a set of genetic variables/markers 
(e.g., single nucleotide polymorphisms or SNPs) are 
induced to two levels (e.g., high risk and low risk) of a new 
binary variable, called an MDR classifier. The induction is 
conducted via assessing odds of two phenotypic classes for 
each genotype combination. Among MDR classifiers 
representing specific marker sets, the single best MDR 
classifier is selected by evaluating their classification per- 
formances, such as cross-validation consistency (CVC). As 
a result, the corresponding set of genetic markers is identi- 
fied as having the strongest association with a trait of 
interest. 

While MDR was introduced for binary traits, there is no 
existing approach that is applicable to ordinal categorical 
traits. In many genetic association studies, examples of 
traits having ordinal features are commonly available, such 
as the obesity classification based on body mass index 
(e.g., normal, pre-obese, mild obese and severe obese), the 
diabetes diagnosis based on glucose level (e.g., normal, 
impaired glucose tolerance and diabetes) and the severity 
classification of metabolic syndrome. The current applica- 
tion of MDR to these ordinal traits requires to dichotomi- 
zation of traits by combining several categories, which 
results in the loss of ordinal information and powers. 

In this study, we propose an ordinal MDR (OMDR) 
approach that enables one to analyze a joint effect of mul- 
tiple genetic variables on an ordinal categorical trait. The 
proposed OMDR generates a classifier for each set of 
genetic markers in the form of a categorical variable with 
ordinal levels. The performance of each OMDR classifier 
is evaluated to select the best OMDR classifiers. For per- 
formance evaluation, we suggest the use of common ordi- 
nal association measures, such as tau-b [25], which test for 
the trend of directional association between two ordinal 
variables. By using the ordinal association measures, the 
performance of OMDR classifier can be evaluated by 
the degree of tendency of positive association between the 
observed categories of an ordinal trait and the estimated 
categories by OMDR. 

In addition, we propose a way to report multiple candi- 
dates of gene-gene interactions in OMDR as well as MDR 
analyses. The original MDR approach reports only a single 
best candidate. This feature can be impractical and/or 
unreasonable when causal gene-gene interactions are 
searched for complex traits, especially in a genome-wide 
scale. Because genome-wide association studies with up to 
~1 million SNPs became common, there is a growing 
need for more efficient criterion to report multiple candi- 
dates of gene-gene interactions in the MDR analysis. 



Thus, we propose a new evaluation measure, generalized 
cross-validation consistency (GCVC), according to which 
one can report multiple best gene-gene interactions asso- 
ciated with the ordinal trait. Specifically, a pre-specified 
number (IC) of the best classifiers are selected via this 
GCVC. 

Simulations are conducted to investigate performance of 
the proposed new OMDR method and GCVC. We apply 
the proposed method to an ordinal obesity trait for body 
mass index (i.e., normal, pre-obese, mild obese and severe 
obese) of Age-Related Eye Disease Study data [26]. 

Methods 

Overall procedure of OMDR 

The OMDR procedure is same as the MDR procedure 
for binary traits, and consists of multiple steps. First, the 
dataset is partitioned into L (usually equal-size) subsets 
for L-fold cross-validation (CV). For example, L = 10 
hereafter. Out of 10 subsets, one subset is taken as an 
independent testing dataset, and the remaining nine 
subsets are assigned to a training dataset. As a result, a 
total of 10 CV datasets are generated. Second, all possi- 
ble OMDR classifiers are constructed for the corre- 
sponding combinations of m SNPs, and the K best ones 
are selected based on classification performance on a 
training data for each CV set (see the following two sec- 
tions for details). Third, the best OMDR classifiers are 
chosen over all CV sets for the fixed m. The predictabil- 
ity of the selected OMDR classifiers is evaluated via the 
average value of the evaluation measure with a testing 
dataset over all the 10 CVs. In addition, the selection 
strength of a particular OMDR classifier is suggested via 
GCVC*" which is the number of times the classifier is 
identified as one of the K best classifiers across all the 
CVs. The best OMDR classifiers across the CVs are 
chosen if having the maximum predictability and maxi- 
mum GCVC^. Finally, the overall best OMDR classifiers 
are selected based on the predictability and GCVC^ 
among the best ones for various values of m, which 
result from the previous steps. For additional details, 
refer to the original MDR procedure described in litera- 
ture [13,27,28]. 

OMDR classifier construction 

Let 1, 2,..., / be classes for an ordinal phenotype of inter- 
est. For example, 'low blood pressure (BP)', 'normal' and 
'high BP' classes can be viewed as classes 1, 2 and 3, 
respectively, in the analysis of the BP classification trait. 
Note that / = 2 for a binary trait (e.g., classesl and 2 
respectively for control and case). 

Suppose that an w-way interaction is under considera- 
tion. For the corresponding m SNPs, let n tj be the num- 
ber of individuals with the z'th multi-locus genotype and 
let n +j be the total number of individuals in phenotypic 
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class /, where i = {1, 2,...,3 m } and / = 1, 2,..., J. As in 
MDR, the estimated OR of the class / against the class 1 
is defined for the z'th genotype as 

e n - (1) 

n +j /n + i 

Then the OMDR classifier corresponding to the m 
given SNPs will assign all individuals with the ith multi- 
locus genotype into the class c(z') as follows: 

c(i) = arg max 0,; = arg max ( — ) (2) 

j€{i,-,J] j€{i,-,J] \ n +j/ 

The final classification results of the OMDR classifier 
can be described in a JxJ confusion matrix. The (/', k)t\\ 
cell of the confusion matrix is denoted and indi- 

cates the number of individuals in class /', who are clas- 
sified as class k: 

x jk = XI n 'i (3) 

Vt:c(i)=fe 

For example, see Table 1 when / = 3. Because each of 
all possible multi-locus genotypes of m given SNPs is 
represented in a cell of an m-dimensional contingency 
table, the construction of the corresponding OMDR 
classifier allows one to reduce the m-dimensional space 
to one dimensional space. Each constructed OMDR 
classifier is evaluated by an ordinal association measure, 
such as tau-b, which assesses concordance between true 
classes and predicted classes [25]. 

Top-ff selection and generalized CVC 

In order to report multiple causal SNP combinations, we 
propose the generalized CVC based on top-A" selection 
(GCVC /C ). After constructing all possible MDR classi- 
fiers, each classifier is evaluated respectively with train- 
ing and testing datasets via a certain evaluation measure 
of predictability (e.g., tau-b). Then, a pre-specified num- 
ber (K) of the best classifiers having the largest values of 
the evaluation measure with the training set are selected 
as top-A" classifiers for each CV dataset. 

Next, the selection results are summarized across all 
CV datasets in order to suggest multiple best classifiers. 
The proposed GCVC^ is defined as below and calcu- 
lated for each MDR classifier: 

Table 1 Confusion matrix for three-class ordinal 



phenotype, constructed by an OMDR classifier 



Predicted class 


True class 


1 


2 


3 


1 






*12 


*13 


2 




*21 


X-22 


*23 


3 




*31 


X32 


*33 



L f 1 if the MDR classifier is identified 

GCVC K = J^h where as one of top - K classifiers at (* CV dataset (4) 

f=i [ 0 otherwise 

The GCVC^ indicates how many of the training-test 
sets support the classifier as the K best classifiers in L- 
fold CV. When K = 1, GCVC 1 is equal to the original 
CVC. Note that the proposed GCVC is applicable to 
both MDR and OMDR. 

Via a certain criterion based on GCVC^, multiple candi- 
dates of causal gene-gene interactions with the same order 
can be reported along with their performance measures 
(e.g., predictability for training and test datasets). A criter- 
ion can be chosen appropriately according to the analysis 
purpose. We demonstrate possible choices in practice with 
the following three examples. First, all combinations with 
GCVC /C > 0 are reported to search all possible candidates 
(i.e., exploratory purpose). In other words, every combina- 
tion that was selected as the K best classifiers at least once 
during CV will be reported. Second, one can report all 
combinations with GCVC /C > 9 in 10-fold CV, intending 
to identify candidates with high selection consistency (i.e., 
high confidence). This criterion means that these combi- 
nations are likely selected with at least 90% chance. Third, 
100 plausible candidates are listed up for further studies 
by reporting top 100 combinations that have the largest 
values ofGCVC^. 

Results 

Simulation study 

An ordinal trait was modeled with 3 classes (e.g., /' = 1 
for normal, /' = 2 for low risk, = 3 for high risk). Pro- 
portion of each class in the population (i.e., pj = P(j) 
'prevalence' of /th class) was set as p 1 = 0.3, p 2 = 0.4 
and p 3 = 0.3. A total of 50 SNPs were considered. 
Among all the SNPs, one pair of SNPs were simulated 
as a causal factor that has a two-way interaction asso- 
ciated with the ordinal trait; and the remaining SNPs 
were simulated as non-causal factors. For generating the 
genotype data of the causal SNPs, five different interaction 
patterns were developed for the ordinal trait (Figure 1). 
While fixing minor allele frequencies (0.3 and 0.5), preva- 
lences and interaction pattern, we simulated 3 different 
sets of ORs of each class for each multi-locus genotype in 
order to vary the strength of genetic effects. Based on 
given ORs, probabilities of each class for each multi-locus 
genotype (i.e., Pj\ t = P(j | the ith genotype) 'penetrance' of 
;'th class for ith genotype) were computed under the 
Hardy- Weinberg equilibrium assumption for each SNP. 
As a result, 15 different genetic models were developed 
(Table SI in Additional file 1). For each genetic model, 
100 replicated datasets were generated. Each simulated 
dataset consists of 1000 samples. For comparison, we 
further generated a binary trait by assigning the first two 
classes of the simulated ordinary trait (i.e., normal and low 
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Figure 1 Simulated patterns of 2-way interactions. White, light grey and dark grey colors indicate respectively three classes (e.g., normal, low 
risk, high risk) of an ordinary trait. 



risk) to 'control' and the third class (i.e., high risk) to 'case'. 
Thus the prevalence of case is expected to be 0.3 for the 
binary trait. The proposed OMDR and the original MDR 
were applied to the simulated datasets. The 10-fold CV 
and tau-b were employed to assess the performance of 
classifiers. All possible single-, two- and three-locus classi- 
fiers were evaluated. Different choices of K = 1, 2, 3 were 
considered to examine the effect of choice on GCVC^. 

The performances of the new OMDR method were 
investigated in terms of power, fitness, predictability, and 
selection stability. The empirical power was defined as 
the proportion of 100 replicated datasets in which the 
true causal SNP combination was detected as the best 
classifier. The fitness and predictability were measured 
respectively via average training tau-b (TRTB) and testing 
tau-b (TSTB) values across 100 replicated datasets. Aver- 
age GCVC was used to assess selection stability. First, we 
observed fairly good empirical power across various 
genetic models (Table 2). Especially two-locus OMDR 
classifiers show high empirical power of about 90% on 
average, and 100% for two third of genetic models. Sec- 
ond, the overall predictability of two-locus classifiers 
(average TSTB = 0.285) was slightly higher than or simi- 
lar to three-locus classifiers (average TSTB = 0.276) 
while single-locus classifiers had relatively low predict- 
ability (average TSTB = 0.139). Therefore, the proposed 



OMDR tends not to choose lower-order interactions 
than the order of the true causal interaction. Third, two- 
locus classifiers were most stably selected (average 
GCVC = 92.0%) compared to others, especially three- 
locus classifiers (average GCVC = 48.3%). Thus the 
OMDR selected two-locus classifiers as a final best 
model more likely than three-locus ones. This indicates 
that the OMDR would choose true causal interactions 
while avoiding over-fitting. As expected, higher-order 
classifiers showed higher fitness (average TRTB = 0.162, 
0.304 and 0.336, respectively, for single-, two-, and three- 
locus classifiers), and that the difference in average TRTB 
between two- and three-locus classifiers was not great. 

We compared the performance between the OMDR 
method and the original MDR method. Overall, the 
OMDR showed better performance than the MDR 
across all performance measures (Figure 2). Especially, 
we observed higher empirical power and better selection 
stability for the OMDR than for the MDR. Also, predict- 
ability and fitness indicated that the OMDR (on average, 
TSTB = 0.285, TRTB = 0.304) outperformed the MDR 
(on average, TSTB = 0.209, TRTB = 0.247) across all 
genetic models. 

The effect of K on the OMDR was examined with dif- 
ferent K - 1, 2 and 3. Because we simulated with a single 
causal two-way interaction, selected classifiers must 
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Table 2 Performance of OMDR 
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It is presented with EP (empirical power), average GCVC (K = 1), average TSTB (testing tau-b), average TRTB (training tau-b), and their average values over 
models. Note that true causal factor was a two-locus classifier (i.e., two-way interaction). SEP indicates EP of single-locus classifier whose EP is largest among all 
single-locus classifiers included in the true causal interaction. TEP indicates EP of three-locus classifier whose EP is largest among all three-locus classifiers 
containing the true causal interaction. 




EP GCVC TSTB TRTB 



Figure 2 Comparison between OMDR and MDR. Performance of OMDR and MDR is compared via EP (empirical power), average GCVC [K = 
1), average TSTB (testing tau-b), average TRTB (training tau-b), and their average values over models. Note that true causal factor was a two- 
locus classifier (i.e., two-way interaction), and all two-locus classifiers were searched by both methods. 
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include false positives when K = 2 or 3 were chosen for 
top-/<" selection. In most genetic models, the causal inter- 
action was identified as the best classifiers (i.e., true posi- 
tives). Thus the second best or the third best classifiers 
were falsely identified (i.e., false positives). We compared 
the selection stability and predictability between true and 
false positives. While true positives were selected with 
high stability (average GCVC = 92.0-94.7%), false posi- 
tives were selected with very low stability (average GCVC 
= 3.4-43.5%). These limited results imply that one can 
avoid false positives, which the OMDR produces with 
large K, by further screening out the selected classifiers 
with low GCVC. Thus incorrect choices of K would not 
fail the OMDR although further investigation on the 
choice of K is required. Note that predictability was 
higher for true positives (average TSTB = 0.285-0.288) 
that for false positives (average TSTB = 0.106-0.133). 

Analysis of AREDS body-mass index data 

In order to demonstrate the proposed OMDR, we applied 
it to a body-mass index (BMI) category phenotype from 
the National Eye Institute Age-Related Eye Disease Study 
(AREDS). While the AREDS was originally designed to 
investigate the clinical course of age-related macular 
degeneration (AMD), the data contains other information 
on medical history, clinical status, life condition, and phy- 
sical measurements, including BMI. A total of 313 subjects 
with and 149 without AMD were genotyped on Affymetrix 
100K genotyping platform. The detailed information on 
this data is available in Mailer et al. [26] . Prior to the ana- 
lysis, we conducted a pre-process on a total of 109,924 
SNPs by excluding SNPs whose total genotyping rate < 
99.5%, minor allele frequency < 0.05, or p-value from 
Hardy- Weinberg equilibrium test < 10" 3 . As a result, a 
total of 87,260 SNPs remained for the analysis. 

According to the international BMI classification [29], 
an adult person can be classified as normal when 18.5 < 
BMI < 25, and as overweight when BMI > 25. The over- 
weight class is further divided into pre-obese, obese class 
I, obese class II and obese class III (Table 3). Using this 
classification, we defined a four-class ordinal phenotype 
'OD' (i.e., normal, pre-obese, mild obese and severe obese) 
to identify genes and gene-gene interactions associated 



with obesity via the proposed OMDR. The sample sizes 
are 141, 194, 87 and 38 for normal, pre-obese, mild obese 
and severe obese classes, respectively. In addition, two bin- 
ary phenotypes 'Bl' and 'B2' (i.e., normal and overweight; 
non-obese and obese) were defined and analyzed via the 
current MDR for dichotomous phenotypes for the com- 
parison purpose. 

For the ordinal phenotype (OD) and two binary pheno- 
types (Bl and B2), the proposed OMDR and the current 
MDR were respectively applied to identify SNPs asso- 
ciated with obesity. We used K = 300 to select multiple 
best MDR classifiers for each of 1 -4-way interactions. 
The 10-fold CV and tau-b were employed to assess the 
performance of classifiers. All 87,260 SNPs were first 
searched for one-way effects on obesity. Then, to reduce 
the computational burden, we examined all pairwise 
combinations of the top-300 SNPs with main effects. 
Similarly, three- and four-way combinations were 
searched only for the SNPs that were selected with top- 
300 two- and three-way interactions, respectively. 

For the top-300 SNPs identified with main effects, the 
average GCVC was 6.67 for OD while it was 5.98 and 
5.96, respectively for Bl and B2. We also observed that 
more SNPs were identified with high GCVC for OD than 
Bl and B2. For example, the number of SNPs showing 
GCVC = 10 is 58, 22 and 26 respectively for OD, Bl and 
B2. The number of SNPs with GCVC > 9 for OD is also 
about twice the number of those for Bl and B2. These pat- 
terns are stronger for 2~4-way interactions (Figure SI in 
Additional file 2). While the binary MDR method identi- 
fied most interactions with low GCVC, the OMDR 
approach detected a higher number of interactions with 
high CVC. Among top-300 two-way interactions, 111 have 
GCVC of 10 for OD while 7 and 10 do for Bl and B2, 
respectively. Similarly, 92 three-way and 49 four-way inter- 
actions show GCVC of 10 for OD while only a few do for 
the binary phenotypes. These results indicate that, with a 
high level of selection consistency, the proposed OMDR 
would detect more interactions than the original MDR for 
binary phenotypes. 

While no SNP was selected with main effect across all 
three phenotypes, two SNPs were commonly identified 
by OD and Bl. Fourteen SNPs identified for OD were 



Table 3 Obesity phenotypes based on BMI classification 



WHO classification 


Ordinary Category 
OD 


Binary Category 










B1 


B2 


Normal 


18.5 < BMI < 25 


Norma 


Normal Non-obese 


Pre-obese 


25 < BMI < 30 


Pre-obese 


Overweight 




Obese class 


30 < BMI < 35 


Mild obese 




Obese 


Obese class II 


35 < BMI < 40 


Severe obese 






Obese class III 


BMI > 40 
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also selected for B2. For these commonly selected SNPs, 
we investigated average tau-b values on training and test 
datasets as well as GCVC (Table 4). All of these SNPs 
show better performance both on model fitting and pre- 
diction for the ordinal phenotype (average tau-b = 0.365 
and 0.317 for training and testing datasets on average 
across the commonly selected SNPs) than for the binary 
phenotypes (average tau-b = 0.175 and 0.081 for training 
and testing datasets on average across the commonly 
selected SNPs). Furthermore, we observed higher GCVC 
for OD than for Bl and B2. In other words, the SNP 
selection was more strongly supported by 10-fold CV 
for the ordinal phenotype than for the binary pheno- 
types. These results would imply that the proposed 
OMDR provides more consistent results than the origi- 
nal MDR for binary phenotypes. 

In order to examine the biological significance, we 
further investigated whether the top-300 SNPs with main 
effects are mapped to one of the known obesity-related 
genes that were represented on Affymetrix 100K genotyp- 
ing platform. For each phenotype, only three SNPs were 
identified in the known obesity-related genes. However, 
those obesity-related SNPs were identified more consis- 
tently by the OMDR (average GCVC = 7.33 for OD) than 
by the original MDR (average GCVC = 5 and 6.33 for Bl 
and B2, respectively). Note that the famous obesity-asso- 
ciated gene FTO was detected only via OMDR. Also, we 
found that the ARL6 gene was detected with larger tau-b 
for OD than for B2 (see rs3856570 in Table 4). 



Table 4 Commonly identified SNPs with main effects on 
obesity. 



SNP 




OD 






B1/B2 






GCVC 


Average tau-b 


GCVC 


Average tau-b 






Train 


Test 




Train 


Test 


rs 1975743* 


9 


0.371 


0.352 


9 


0.184 


0.144 


rs1 0504852* 


9 


0.357 


0.345 


7 


0.165 


0.109 


rs3856570 


10 


0.402 


0.397 


10 


0.237 


0.233 


rsl 6631 5 


10 


0.369 


0.367 


5 


0.169 


0.035 


rs997682 


10 


0.369 


0.363 


6 


0.192 


0.246 


rsl 980774 


10 


0.367 


0.361 


6 


0.192 


0.245 


rs354935 


9 


0.387 


0.376 


9 


0.175 


0.155 


rsl 05 15827 


9 


0.360 


0.341 


4 


0.162 


0.061 


rs2006709 


8 


0.361 


0.330 


7 


0.166 


0.100 


rs9591 75 


7 


0.367 


0.319 


4 


0.158 


-0.066 


rs2000862 


7 


0.359 


0.289 


5 


0.166 


0.008 


rs4780469 


7 


0.353 


0.284 


5 


0.165 


0.018 


rsl 009829 


5 


0.361 


0.298 


4 


0.169 


-0.002 


rs4779937 


5 


0.355 


0.261 


4 


0.166 


0.029 


rs9297682 


4 


0.358 


0.205 


5 


0.164 


0.019 


rsl 0508706 


4 


0.353 


0.192 


6 


0.166 


0.071 



SNPs with * were identified for OD and Bl; SNPs with no * were identified for 
OD and B2. 



In addition, various values of K (K = 1, 2,..., 1000) were 
further used to search for possible causal SNPs with main 
effects via the OMDR and the original MDR methods. As 
we increased K (i.e., considered to select a larger number 
of possible causal SNPs), we identified more obesity- 
related SNPs using the OMDR approach than using the 
current MDR, and the gap seems increasing. For example, 
with K = 1000, we identified four more SNPs in known 
obesity-related genes for OD than for Bl and B2. 

Conclusions and discussion 

In this paper, we developed the OMDR approach that 
facilitates the MDR analysis for an ordinal phenotype. The 
construction process for OMDR classifiers is a straightfor- 
ward extension of the process for the existing MDR classi- 
fiers. For selecting good classifiers, the performance of the 
OMDR classifiers has to be evaluated via an evaluation 
measures. We proposed the use of an ordinal association 
measure, specifically tau-b, for some reasons. First, tau-b 
along with likelihood ratio and normalized mutual infor- 
mation has been known to outperform other evaluation 
measures in MDR, including balanced accuracy, misclassi- 
fication error, specificity and sensitivity [28,30]. Second, 
tau-b and other ordinal association measures would be 
natural choices to assess the association between the true 
and the predicted classes (see Table 1), both of which are 
ordinal, in that they utilize the information on positive 
trend in classification results. In addition, tau-b can 
be readily employed to OMDR without modification. 

While designed for the analysis of genuine ordinal cate- 
gorical traits, the OMDR method can also be used to ana- 
lyze a continuous trait by approximating it as an ordinary 
category trait. Currently, the MDR analysis for a continu- 
ous trait is conducted mostly by binarizing it with a certain 
cut-off. Compared to binary approximation, the ordinary 
approximation can be more powerful because it preserves 
more information on the continuous trait. The empirical 
study with a real data demonstrated that the OMDR 
approach would produce more consistent results and be 
more powerful than the original MDR approach for binary 
traits, in terms of GCVC and the number of the classifiers 
identified with high selection consistency, respectively. 

Nowadays, the genome-wide association studies with the 
genotype data produces up to ~1 million SNPs. Reporting 
one single best candidate is impractical and/or unreason- 
able when causal gene-gene interactions are searched for 
complex traits in a genome-wide scale. Thus, we proposed 
GCVC with the top-A" selection to report multiple candi- 
dates of gene-gene interactions in OMDR as well as MDR 
analyses. When one searches for few but possibly strong 
candidates for gene-gene interactions, a small value of K 
would be appropriate. On the other hand, a large value of 
K can be used for detecting many candidates including 
ones with mild effects on traits. Note that the choice of K 
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would be critical for both power increase and false-positive 
control, which requires a further investigation. However, 
our simulations suggested that false positives can be 
screened out with very low GCVC values and low predict- 
ability values. We also investigated the null distribution of 
GCV C K with different K = 1, 2, 3, based on a small simula- 
tion, and found that the choice of K did not dramatically 
affect the null distribution of GCVC /C (data not shown). 

The original MDR for binary traits (e.g., disease status) 
compares the estimated ORs between two classes (e.g., 
case vs. control), and determines the class with larger esti- 
mated OR as the predicted class. When the estimated ORs 
are same for both classes, one class is usually specified for 
the prediction purpose (e.g., high risk). Similarly, more 
than one class can happen to have the same maximum 
value of the estimated OR (i.e., a tie in the estimated OR 
among classes; multiple values of c(j)) in the OMDR 
approach. There might be many possible options to 
address this tie problem. For examples, the class with the 
smallest or largest K can be used as the predicted class 
among the tied classes. In our analysis, we chose the lar- 
gest class for prediction in tied cases following the original 
MDR approach. 
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